Attention sink support #533

kareemshaik80 · 2025-09-25T04:22:28Z

Support for single sink logit in flash attention Decode
Add Sink to Softmax
Cmd line flag added to enable attention sink

- Support for single sink logit in flash attention Decode - Add Sink to Softmax - Cmd line flag added to enable attention sink Signed-off-by: kareem <[email protected]>

yuankuns

Also need paper/code reference to ensure this PR is what intended to do

applications/flash_attention_v2/kernel/xe_flash_attn_decode.hpp

yuankuns

not changed

kareemshaik80 · 2025-10-07T04:01:43Z

Also need paper/code reference to ensure this PR is what intended to do

you can refer this: https://arxiv.org/pdf/2309.17453

eager code: https://github.com/huggingface/transformers/blob/caa14e7dabb086f167c14b7eecadc2ba9db25eb6/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L258

kareemshaik80

Move test under unit tests.

examples/06_bmg_flash_attention/bmg_flash_attn_decode_runner.hpp

applications/flash_attention_v2/collective/xe_flash_attn_decode_softmax_epilogue.hpp

examples/06_bmg_flash_attention/bmg_flash_attn_decode_runner.hpp

Signed-off-by: kareem <[email protected]>

applications/flash_attention_v2/kernel/xe_flash_attn_decode.hpp

applications/flash_attention_v2/collective/xe_flash_attn_decode_softmax_epilogue.hpp

Antonyvance · 2025-10-17T21:24:10Z

@kareemshaik80 I believe this implementation need to change based on this PR 547

sunjiweiswift · 2025-10-22T08:39:32Z

The calculation here is incorrect.

We use exp2, so we need to add a sink x (constexpr double kLog2e = 1.4426950408889634074);
In oneline softmax, it's more appropriate to process the sink in stage 2, as is done in triton (https://github.com/openai/gpt-oss/blob/0a9ec7f69d8aa71841c5cefcd84a512344b9f1be/gpt_oss/triton/attention.py#L94C4-L100C46). Introducing the sink in stage 1 is incorrect; it will change and gemm results for V

I've completed this functionality in cutlass, f709e32

It is recommended to use a more stringent ut to check the results

Attention sink support

1f0d33a

- Support for single sink logit in flash attention Decode - Add Sink to Softmax - Cmd line flag added to enable attention sink Signed-off-by: kareem <[email protected]>

kareemshaik80 marked this pull request as draft September 25, 2025 07:38

kareemshaik80 marked this pull request as ready for review September 25, 2025 07:39

Merge branch 'intel:main' into attention_sink

19acc42

yuankuns suggested changes Sep 25, 2025

View reviewed changes

applications/flash_attention_v2/kernel/xe_flash_attn_decode.hpp Outdated Show resolved Hide resolved

applications/flash_attention_v2/kernel/xe_flash_attn_decode.hpp Show resolved Hide resolved

kareemshaik80 requested a review from yuankuns September 26, 2025 02:33

yuankuns reviewed Sep 26, 2025

View reviewed changes

kareemshaik80 requested a review from yuankuns October 7, 2025 04:01

kareemshaik80 commented Oct 7, 2025

View reviewed changes

examples/06_bmg_flash_attention/bmg_flash_attn_decode_runner.hpp Outdated Show resolved Hide resolved

jayachandranb-ai reviewed Oct 7, 2025

View reviewed changes

applications/flash_attention_v2/collective/xe_flash_attn_decode_softmax_epilogue.hpp Show resolved Hide resolved

jayachandranb-ai reviewed Oct 7, 2025

View reviewed changes

examples/06_bmg_flash_attention/bmg_flash_attn_decode_runner.hpp Outdated Show resolved Hide resolved

kareemshaik80 and others added 4 commits October 13, 2025 09:33

Merge branch 'intel:main' into attention_sink

56908c8

Fix Review comments

4f95e33

Signed-off-by: kareem <[email protected]>

fix compile errors

85f6507

Signed-off-by: kareem <[email protected]>

Add Sink Attention unit test

fcf6928

Signed-off-by: kareem <[email protected]>

kareemshaik80 requested a review from jayachandranb-ai October 13, 2025 08:20

yuankuns suggested changes Oct 13, 2025

View reviewed changes

kareemshaik80 requested a review from yuankuns October 14, 2025 05:59

bhargaveede reviewed Oct 14, 2025

View reviewed changes

applications/flash_attention_v2/collective/xe_flash_attn_decode_softmax_epilogue.hpp Show resolved Hide resolved

kareemshaik80 requested a review from bhargaveede October 16, 2025 01:00

Antonyvance added the redesign required Implementation require a redesign label Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attention sink support #533

Attention sink support #533

kareemshaik80 commented Sep 25, 2025

Uh oh!

yuankuns left a comment

Uh oh!

Uh oh!

Uh oh!

yuankuns left a comment

Uh oh!

kareemshaik80 commented Oct 7, 2025

Uh oh!

kareemshaik80 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

sunjiweiswift commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Attention sink support #533

Are you sure you want to change the base?

Attention sink support #533

Conversation

kareemshaik80 commented Sep 25, 2025

Uh oh!

yuankuns left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yuankuns left a comment

Choose a reason for hiding this comment

Uh oh!

kareemshaik80 commented Oct 7, 2025

Uh oh!

kareemshaik80 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Antonyvance commented Oct 17, 2025

Uh oh!

sunjiweiswift commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sunjiweiswift commented Oct 22, 2025 •

edited

Loading